Search CORE

67 research outputs found

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

Author: Gao Yingming
Publication venue
Publication date: 04/08/2022
Field of study

Articulatory copy synthesis (ACS), a subarea of speech inversion, refers to the reproduction of natural utterances and involves both the physiological articulatory processes and their corresponding acoustic results. This thesis proposes two novel methods for the ACS of human speech using the articulatory speech synthesizer VocalTractLab (VTL) to address or mitigate the existing problems of speech inversion, such as non-unique mapping, acoustic variation among different speakers, and the time-consuming nature of the process. The first method involved finding appropriate VTL gestural scores for given natural utterances using a genetic algorithm. It consisted of two steps: gestural score initialization and optimization. In the first step, gestural scores were initialized using the given acoustic signals with speech recognition, grapheme-to-phoneme (G2P), and a VTL rule-based method for converting phoneme sequences to gestural scores. In the second step, the initial gestural scores were optimized by a genetic algorithm via an analysis-by-synthesis (ABS) procedure that sought to minimize the cosine distance between the acoustic features of the synthetic and natural utterances. The articulatory parameters were also regularized during the optimization process to restrict them to reasonable values. The second method was based on long short-term memory (LSTM) and convolutional neural networks, which were responsible for capturing the temporal dependence and the spatial structure of the acoustic features, respectively. The neural network regression models were trained, which used acoustic features as inputs and produced articulatory trajectories as outputs. In addition, to cover as much of the articulatory and acoustic space as possible, the training samples were augmented by manipulating the phonation type, speaking effort, and the vocal tract length of the synthetic utterances. Furthermore, two regularization methods were proposed: one based on the smoothness loss of articulatory trajectories and another based on the acoustic loss between original and predicted acoustic features. The best-performing genetic algorithms and convolutional LSTM systems (evaluated in terms of the difference between the estimated and reference VTL articulatory parameters) obtained average correlation coefficients of 0.985 and 0.983 for speaker-dependent utterances, respectively, and their reproduced speech achieved recognition accuracies of 86.25% and 64.69% for speaker-independent utterances of German words, respectively. When applied to German sentence utterances, as well as English and Mandarin Chinese word utterances, the neural network based ACS systems achieved recognition accuracies of 73.88%, 52.92%, and 52.41%, respectively. The results showed that both of these methods not only reproduced the articulatory processes but also reproduced the acoustic signals of reference utterances. Moreover, the regularization methods led to more physiologically plausible articulatory processes and made the estimated articulatory trajectories be more articulatorily preferred by VTL, thus reproducing more natural and intelligible speech. This study also found that the convolutional layers, when used in conjunction with batch normalization layers, automatically learned more distinctive features from log power spectrograms. Furthermore, the neural network based ACS systems trained using German data could be generalized to the utterances of other languages

Technische Universität Dresden: Qucosa

Recent Results on Balanced Symmetric Boolean Functions

Author: Guangpu Gao
Yaqun Zhao
Yingming Guo
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 23/04/2012
Field of study

In this paper we prove all balanced symmetric Boolean functions of fixed degree are trivial when the number of variables grows large enough. We also present the nonexistence of trivial balanced elementary symmetric Boolean functions except for

n=l\cdot2^{t+1}-1

and

d=2^t

, where

t

and

l

are any positive integers, which shows Cusick\u27s conjecture for balanced elementary symmetric Boolean functions is exactly the conjecture that all balanced elementary symmetric Boolean functions are trivial balanced. In additional, we obtain an integer

n_0

, which depends only on

d

, that Cusick\u27s conjecture holds for any

n>n_0

Cryptology ePrint Archive

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Author: Gao Yingming
Han Yichen
Li Ya
Wang Songpo
Xue Jinlong
Yang Lei
Publication venue
Publication date: 07/10/2022
Field of study

Audio driven talking head synthesis is a challenging task that attracts increasing attention in recent years. Although existing methods based on 2D landmarks or 3D face models can synthesize accurate lip synchronization and rhythmic head pose for arbitrary identity, they still have limitations, such as the cut feeling in the mouth mapping and the lack of skin highlights. The morphed region is blurry compared to the surrounding face. A Keypoint Based Enhancement (KPBE) method is proposed for audio driven free view talking head synthesis to improve the naturalness of the generated video. Firstly, existing methods were used as the backend to synthesize intermediate results. Then we used keypoint decomposition to extract video synthesis controlling parameters from the backend output and the source image. After that, the controlling parameters were composited to the source keypoints and the driving keypoints. A motion field based method was used to generate the final image from the keypoint representation. With keypoint representation, we overcame the cut feeling in the mouth mapping and the lack of skin highlights. Experiments show that our proposed enhancement method improved the quality of talking-head videos in terms of mean opinion score

arXiv.org e-Print Archive

Recommended from our members

Class I histone deacetylases (HDAC1-3) are histone lysine delactylases

Author: Bolding Julie E.
Bæk Michael
Danková Daniela
Gao Jinjun
Jameson Samuel T.
Liu Wenchao
Moreno-Yruela Carlos
Nielsen Alexander L.
Olsen Christian A.
Wei Wei
Wong Jiemin
Yang Lu
Zhang Di
Zhao Yingming
Publication venue
Publication date: 12/02/2024
Field of study

Lysine L-lactylation [K(L-la)] is a newly discovered histone mark stimulated under conditions of high glycolysis, such as the Warburg effect. K(L-la) is associated with functions that are different from the widely studied histone acetylation. While K(L-la) can be introduced by the acetyltransferase p300, histone delactylases enzymes remained unknown. Here, we report the systematic evaluation of zinc- and nicotinamide adenine dinucleotide-dependent histone deacetylases (HDACs) for their ability to cleave ε-N-L-lactyllysine marks. Our screens identified HDAC1-3 and SIRT1-3 as delactylases in vitro. HDAC1-3 show robust activity toward not only K(L-la) but also K(D-la) and diverse short-chain acyl modifications. We further confirmed the de-L-lactylase activity of HDACs 1 and 3 in cells. Together, these data suggest that histone lactylation is installed and removed by regulatory enzymes as opposed to spontaneous chemical reactivity. Our results therefore represent an important step toward full characterization of this pathway's regulatory elements

Knowledge UChicago

Crystal structure of rhodopsin bound to arrestin by femtosecond X-ray laser.

Author: Barty Anton
Basu Shibom
Boutet Sébastien
Caffrey Martin
Caro Lydia N
Carragher Bridget
Chapman Henry N
Cherezov Vadim
Coe Jesse
Conrad Chelsie E
de Waal Parker W
Diederichs Kay
Dong Yuhui
Ernst Oliver P
Fromme Petra
Fromme Raimund
Gao Xiang
Gati Cornelius
Griffin Patrick R
Grotjohann Ingo
Gu Xin
Gurevich Vsevolod V
Han Gye Won
He Yuanzheng
Howe Nicole
Hubbell Wayne L
Ishchenko Andrii
James Daniel
Jiang Hualiang
Jiang Yi
Kang Yanyong
Katritch Vsevolod
Ke Jiyuan
Kupitz Christopher
Lee Regina J
Li Dianfan
Li Jun
Lisova Stella
Liu Haiguang
Liu Wei
Ma Jinming
Melcher Karsten
Messerschmidt Marc
Moeller Arne
Pal Kuntal
Pascal Bruce D
Potter Clinton S
Roy-Chowdhury Shatabdi
Spence John CH
Standfuss Jörg
Stevens Raymond C
Suino-Powell Kelly M
Tan MH Eileen
Tan Minjia
Van Eps Ned
Vishnivetskiy Sergey A
Wang Dingjie
Wang Meitian
Weierstall Uwe
West Graham M
White Thomas A
Williams Garth J
Xu H Eric
Xu Qingping
Yang Huaiyu
Yefanov Oleksandr
Zatsepin Nadia A
Zhang Chenghai
Zhao Yingming
Zheng Zhong
Zhi Xiaoyong
Zhou X Edward
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

G-protein-coupled receptors (GPCRs) signal primarily through G proteins or arrestins. Arrestin binding to GPCRs blocks G protein interaction and redirects signalling to numerous G-protein-independent pathways. Here we report the crystal structure of a constitutively active form of human rhodopsin bound to a pre-activated form of the mouse visual arrestin, determined by serial femtosecond X-ray laser crystallography. Together with extensive biochemical and mutagenesis data, the structure reveals an overall architecture of the rhodopsin-arrestin assembly in which rhodopsin uses distinct structural elements, including transmembrane helix 7 and helix 8, to recruit arrestin. Correspondingly, arrestin adopts the pre-activated conformation, with a ∼20° rotation between the amino and carboxy domains, which opens up a cleft in arrestin to accommodate a short helix formed by the second intracellular loop of rhodopsin. This structure provides a basis for understanding GPCR-mediated arrestin-biased signalling and demonstrates the power of X-ray lasers for advancing the frontiers of structural biology

DESY Publication Database

PubMed Central

eScholarship - University of California

DESY

Genome editing reveals dmrt1 as an essential male sex-determining gene in Chinese tongue sole (Cynoglossus semilaevis)

Author: Chen Songlin
Cheng Christopher
Cui Zhongkai
Dong Zhongdian
Gao Fengtao
Guo Hua
Hu Mengzhu
Li Hailong
Li Yangzhen
Lin Fan
Liu Yang
Liu Yun
Meng Liang
Schartl Manfred
Shao Changwei
Wang Na
Wang Qian
Wang Wenwen
Wei Min
Wei Zhanfei
Yang Yingming
Zhang Ning
Zhu Ying
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Chinese tongue sole is a marine fish with ZW sex determination. Genome sequencing suggested that the Z-linked dmrt1 is a putative male determination gene, but direct genetic evidence is still lacking. Here we show that TALEN of dmrt1 efficiently induced mutations of this gene. The ZZ dmrt1 mutant fish developed ovary-like testis, and the spermatogenesis was disrupted. The female-related genes foxl2 and cyp19a1a were significantly increased in the gonad of the ZZ dmrt1 mutant. Conversely, the male-related genes Sox9a and Amh were significantly decreased. The dmrt1 deficient ZZ fish grew much faster than ZZ male control. Notably, we obtained an intersex ZW fish with a testis on one side and an ovary on the other side. This fish was chimeric for a dmrt1 mutation in the ovary, and wild-type dmrt1 in the testis. Our data provide the first functional evidence that dmrt1 is a male determining gene in tongue sole

Texas A&M Repository

PubMed Central

Direct and indirect effects of climate on richness drive the latitudinal diversity gradient in forest trees

Author: Abiem Iveren
Alonso Alfonso
Anderson-Teixeira Kristina J.
Baltzer Jennifer L.
Bourg Norm
Burslem David F. R. P.
Cao Min
Chapman Hazel
Chu Chengjin
Condit Richard
de Oliveira Alexandre A.
Fang Suqin
Fischer Gunter A.
Gao Lianming
Hao Zhanqin
Hau Billy C. H.
He Fangliang
He Qing
Hector Andrew
Hubbell Stephen P.
Jiang Mingxi
Jin Guangze
Kenfack David
Kral Kamil
Lai Jiangshan
Li Buhang
Li Xiankun
Li Yide
Lian Juyu
Lin Luxiang
Liu Yankun
Liu Yu
Luo Yahuang
Lutz James A.
Ma Keping
McShea William
Memiaghe Herve
Mi Xiangcheng
Mittelbach Gary G.
Myers Jonathan A.
Ni Ming
O'Brien Michael J.
Orwig David A.
Parker Geoffrey G.
Qiao Xiujuan
Ren Haibao
Reynolds Glen
Sang Weiguo
Shen Guochun
Storch David
Su Zhiyao
Sui Xinghua
Sun I-Fang
Tian Songyan
Vrska Tomas
Wang Bin
Wang Xihua
Wang Xugao
Wang Youshi
Weiblen George D.
Wen Shujun
Xi Nianxun
Xiang Wusheng
Xu Han
Xu Kun
Ye Wanhui
Yin Xue
Zhang Bingwei
Zhang Jiaxin
Zhang Xiaotong
Zhang Yingming
Zhu Kai
Zimmerman Jess
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Data accessibility statement: Full census data are available upon reasonable request from the ForestGEO data portal, http://ctfs.si.edu/datarequest/ We thank Margie Mayfield, three anonymous reviewers and Jacob Weiner for constructive comments on the manuscript. This study was financially supported by the National Key R&D Program of China (2017YFC0506100), the National Natural Science Foundation of China (31622014 and 31570426), and the Fundamental Research Funds for the Central Universities (17lgzd24) to CC. XW was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB3103). DS was supported by the Czech Science Foundation (grant no. 16-26369S). Yves Rosseel provided us valuable suggestions on using the lavaan package conducting SEM analyses. Funding and citation information for each forest plot is available in the Supplementary Information Text 1.Peer reviewedPostprin

Aberdeen University Research Archive

Crossref

of Botany,Chinese Academy Of Sciences

Institute of Hydrobiology, Chinese Academy Of Sciences

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

Author: Gao Yingming
Publication venue
Publication date: 04/08/2022
Field of study

HSSS - Hochschulschriftenserver der SLUB

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

Author: Gao Yingming
Publication venue
Publication date: 04/08/2022
Field of study

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Computational Modelling of Tone Perception Based on Direct Processing of f0 Contours

Author: Yi Xu
Yingming Gao
Yue Chen
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

It has been widely assumed that in speech perception it is imperative to first detect a set of distinctive properties or features and then use them to recognize phonetic units like consonants, vowels, and tones. Those features can be auditory cues or articulatory gestures, or a combination of both. There have been no clear demonstrations of how exactly such a two-phase process would work in the perception of continuous speech, however. Here we used computational modelling to explore whether it is possible to recognize phonetic categories from syllable-sized continuous acoustic signals of connected speech without intermediate featural representations. We used Support Vector Machine (SVM) and Self-organizing Map (SOM) to simulate tone perception in Mandarin, by either directly processing f0 trajectories, or extracting various tonal features. The results show that direct tone recognition not only yields better performance than any of the feature extraction schemes, but also requires less computational power. These results suggest that prior extraction of features is unlikely the operational mechanism of speech perception

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

UCL Discovery